The coral Acropora loripes genome reveals an alternative pathway for cysteine biosynthesis in animals

The metabolic capabilities of animals have been derived from well-studied model organisms and are generally considered to be well understood. In animals, cysteine is an important amino acid thought to be exclusively synthesized through the transsulfuration pathway. Corals of the genus Acropora have lost cystathionine β-synthase, a key enzyme of the transsulfuration pathway, and it was proposed that Acropora relies on the symbiosis with dinoflagellates of the family Symbiodiniaceae for the acquisition of cysteine. Here, we identify the existence of an alternative pathway for cysteine biosynthesis in animals through the analysis of the genome of the coral Acropora loripes. We demonstrate that these coral proteins are functional and synthesize cysteine in vivo, exhibiting previously unrecognized metabolic capabilities of animals. This pathway is also present in most animals but absent in mammals, arthropods, and nematodes, precisely the groups where most of the animal model organisms belong to, highlighting the risks of generalizing findings from model organisms.


Step 8: Scaffolding with 10X reads
Making use of the barcoded in the 10X reads, the SSPACE-long scaffolded assembly was further scaffolded using ARCS (54).
First, 10X reads must be processed with LongRanger longranger basic --fastqs /path_to_10X_reads --id LRanger1 #Note, some header can give rise to errors with some programs, changing the fasta headers to simple names might be needed for correct processing.
The LongRanger processed reads are then mapped to the genome with bwa. Step 9: Gapclosing with PacBio reads The scaffolded genome was gapclosed with PacBio reads using LR_Gapcloser (55) LR_Gapcloser.sh -i Fixed_header_ARCS_Acropora_Loripes.fasta -l \ Lordec_trimmed_Canu_Pacbioreads.fasta -t 32 -r 5 LR_Gapcloser does several iterations of gapclosing, in this case iteration #4 was the last iteration that showed any gap closure improvement.
Step 10: Reduction of duplication Step 11: Preliminary annotation The genome was masked by creating a new repeat library with RepeatModeler and masking the genome with RepeatMasker (60,81,82). RNASeq reads were mapped to the genome with HiSat2(61) and were used as hints provided to BRAKER2 (18,57,58,83,84)   Step 12: Addition of missing scaffolds At step #4 was mentioned that 2 different assemblies were processed, one running Purge Haplotigs with parameter -a 41 and another with -a 66. The assembly generated with -a 41 showed a size of the expected genome size whilst -a 66 showed the best BUSCO completeness score, alas with a larger genome size. Both processed genomes were annotated with BRAKER2 and assessed with BUSCO. Scaffolds containing missing BUSCOs in the -a 41 genome but present in the -a 66 genome were added to the -a 41 genome. The difference in missing BUSCOs between both assemblies was of 0.8 %., thus not many scaffolds were added.

Step 13: Reduction of duplication after addition of scaffolds
As extra scaffolds from one assembly were added into another, potential regions of these new scaffolds could be duplications of previously contained scaffolds. For this reason, another round of HaploMerger2 was done. Step 14: Final scaffolding with 10X reads Making use of the barcoded in the 10X reads, the HaploMerger2 reduced assembly was further scaffolded using ARCS.
#Note, some header can give rise to errors with some programs, changing the fasta headers to simple names might be needed for correct processing.
Initial assemblies with Canu (48), using PacBio reads, led to a genome assembly of 900 Mb, more than twice the expected size. Assembling with the diploid aware assembler Supernova (97) using 10x reads, resulted in a smaller 517 Mb assembly. While the Supernova assembly was closer to the expected genome size, it was extremely fragmented with 47,000 scaffolds. The increased assembly size is likely due to a high degree of heterozygosity in the genome, as the kmer profile by GenomeScope shows a bimodal distribution ( fig. S3). Bimodal distributions are characteristic of highly heterozygous genomes, while a simple Poisson distribution is observed in homozygous genomes (98). Moreover, GenomeScope predicted a heterozygosity rate of 2.2 % ( fig. S4), which is greater than the 1.9% for the highly heterozygous genome of the Pacific oyster Crassostrea gigas (99). C. gigas has been commonly used as a reference for highly heterozygous genomes (100) and has been used while benchmarking different bioinformatic tools such as GenomeScope and the repeat aware assembler Platanus (98). This resulted in an estimation in line with literature of ~400 Mb. However, Canu produced an assembly of more than 900 Mb. This was likely due to the high degree of heterozygosity of the extracted DNA.
The high heterozygosity, which makes genome assembly extremely challenging, is likely due to the combined effects of collecting DNA from a wild organism and extracting DNA from the coral sperm, which consists of a mix of non-identical haploid genotypes as the sperm is formed via meiosis. To reduce the negative effects of heterozygosity, we assembled the genome in several steps. Briefly, PacBio reads were error corrected with LoRDEC (47) using debarcoded 10x reads. The error corrected reads were then assembled with Canu and the assembly was collapsed, to remove redundant regions, with Purge Haplotigs (49) and HaploMerger2 (56). The assembly was then scaffolded using 10x barcoded reads with ARCS (54). For more details about the complete assembly process please refer to Supplementary Methods.  (4,64,69). A comparison of the assembly with other sequenced Acropora species can be found in Table 1.

Repetitive elements in Acropora
Repetitive element contents were compared across Acropora by generating de novo repeat libraries for the genomes of Acropora loripes, Acropora tenuis (62), and Acropora digitifera (4), with RepeatModeler followed by the masking of their genomes with RepeatMasker. Repetitive elements accounted for 43.13 % of the A. loripes genome (table S1), similar to the 43.39 % found in A. tenuis (table S2), but higher than the 30.99 % and 34.55 % of A. digitifera (table S3) and A. millepora (64), respectively (Table 1). In all genomes, interspersed repeats accounted for the majority of the repetitive sequences. Of the classified repeats, Long Interspersed Nuclear Elements (LINEs) were the most abundant repetitive element, followed by DNA transposable elements (tables S1-3).
Of these, 710 consisted of single copy genes, of which 422 were used to reconstruct a phylogenetic tree using the octocoral D. gigantea, as an outgroup (fig. S4). The phylogenetic tree clearly resolved Actiniaria, Complex corals, and Robust corals, showing the calcifying Scleractinia (Complex and Robust corals) diverging from the non-calcifying Actiniaria.
When comparing Acropora species, 10,230 orthogroups were shared across all species ( fig. S4), of which 312 were unique to the genus Acropora. Gene Ontology (GO) enrichment analyses of the unique orthogroups in Acropora, revealed enrichment in terms related to the regulation of apoptosis (data S2). Interestingly, this enrichment was also found in orthogroups unique to Complex (5,557) and Robust corals (1,534) ( fig. S4 and data S2); however, this was not observed in orthogroups unique to the Complex coral P. lutea (data S2), suggesting that P. lutea may have lost some of the genes involved in the regulation of apoptosis. Unique orthogroups of Complex corals also revealed an enrichment in terms related to immune response and G-coupled receptor signaling pathway (data S2). Moreover, it also revealed an enrichment of the term modification of morphology or physiology of other organism involved in symbiotic interaction. This is interesting as stony corals form symbiosis with dinoflagellates, although the term was not enriched in Robust corals. When examining orthogroups unique and present in all species forming symbiosis with Symbiodiniaceae, i.e. Complex, Robust corals and Exaiptasia pallida, only 37 orthogroups were identified. Of these, the term BRCA1-A complex was enriched, which is involved in DNA damage repair (101). However, the number of orthogroups unique and present in all symbiotic species was too small for a robust statistical analysis of GO enrichment and the remaining orthogroups should not be dismissed. Noteworthy, most of these orthogroups had terms related to transmembrane transporter activities, primarily ion channels.
Ion channels are associated with microbial symbiosis in plants (102) and it is possible they also play a role in cnidarian-Symbiodiniaceae symbiosis. Similar to Complex corals, unique orthogroups in Actiniaria (5,436) were also enriched in G-coupled receptor signaling pathway but not in genes related to immune response or symbiosis. The unique orthogroups of the octocoral D. gigantea (1,174) were enriched in terms related to uridylyltransferase activity, which is involved in the regulation of nitrogen assimilation (103,104), and positive regulation of developmental process; both terms related to growth. Results from the GO enrichment analyses can be found in data S2.

O-acetylserine sulfhydrylases in nematodes
Interestingly, O-acetylserine sulfhydrylases (CysK) resembling those in plants have been identified in C. elegans (cysl proteins), where in vitro assays showed that these proteins can produce cysteine (105). This would suggest for an alternative cysteine biosynthesis through the sulfate assimilation pathway that is only found in nematodes and missing in all other animals.
However, the remaining required proteins for the biosynthesis of cysteine through the sulfate assimilation pathway are not found in C. elegans and the biosynthesis of cysteine is considered to be carried out through the transsulfuration pathway (105). Furthermore, these enzymes are not consistently found across nematodes (105) and have been associated with other functions such as response to hypoxia, hydrogen sulfide, and hydrogen cyanide (105)(106)(107). This contrasts with our      Contains the information and sequences of cysteine synthases and MetX enzymes used in this study.